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1.  GPR  System 

GPR  data  from  the  GEO-CENTERS  Energy  Focusing  Ground  Penetrating  Radar  (EE GPR)  system  were 
used  in  these  experiments.  The  EE GPR  is  a  time-domain  impulse  radar  system,  which  includes  an  array 
of  antennas,  a  synchronous  high-speed  interface  model,  a  GPS  system,  and  a  host  computer  for  control 
and  processing.  It  uses  delayed  signals  in  a  wide  multi-element  array  which  focuses  transmitted  and 
received  signals  to  locate  targets  in  the  soil. 

GEO-CENTERS  developed  several  models  of  the  EFGPR.  The  model  used  in  this  project  is  Model  401 
EFGPR.  Model  401  EFGPR  is  a  portable,  three-wheeled  humanitarian  de -mining  system.  It  deploys 
Rolled  Edge  Transverse  Electromagnetic  (RETEM)  antennas  that  increase  in  both  gain  and  upper 
bandwidth,  along  with  improvements  in  RE  components  to  take  advantage  of  the  enhanced  radar 
bandwidth..  It  supports  a  single  1.5m,  6-antenna-pair-array  to  cover  a  Im-detection  swath.  The  center 
frequency  of  the  EFGPR  is  1.25GHZ.  The  401  EFGPR  saves  the  GPR  to  disk  for  off-line  analysis. 

A  raster  scan  is  generated  every  5cm  when  the  system  advances.  The  scan  is  stored  in  25  x  40  x  12  bit 
radar  image  format  that  represents  a  ground  slice.  The  25  pixels  cross  track  represent  Im  wide  swath,  and 
the  40  pixels  represent  the  time  or  depth.  Figure  1  is  an  illustration  of  a  landmine  signature.  The  down- 
track  direction  is  the  direction  of  vehicle  motion.  The  cross-track  direction  is  represented  by  the  variable 
X ,  the  down-track  direction  by  the  variable  y  ,  and  the  depth  by  the  variable  z  . 


- ►  y  (down-track) 

/ 

X 

(cross -track) 

I 

3 

Pepth) 

Figure  1:  An  Illustration  of  Three-dimensional  Landmine  Signature 
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2.  JUXOCO  Site 


The  GEO-CENTERs’  GPR  collected  data  at  a  JEIXOCO  calibration  grid  in  May  2001.  The  grid  has  since 
been  dismantled.  The  calibration  grid  consisted  of  a  5m  x  25m  plot.  The  grid  was  divided  into  5  lanes 
side  by  side,  referred  to  as  lanes  A,  B  ,  C ,  D ,  and  E .  Each  lane  was  25m  long  and  Im  wide.  Each  lane 
was  uniformly  divided  into  25  grid  squares.  At  the  center  of  each  square,  a  landmine  or  clutter  was  buried 
in  various  depths,  or  the  square  was  left  empty.  In  lane  A  and  lane  C ,  different  types  of  clutter  were 
buried  at  varies  depths  in  the  grid  squares.  The  anti-personnel  landmines  were  buried  in  some  grid  squares 
of  Eane  B  ,  while  anti-tank  landmines  in  lane  D  .  In  lane  E ,  AP  and  AT  landmines,  landmine  simulants 
and  several  clutter  objects  were  buried  in  most  of  the  squares.  Table  1  is  an  enumeration  of  the  objects 
buried  in  the  grid.  The  weights  of  the  metal  clutter  objects  range  from  0.41g  to  306. OOg.  The  non-metal 
clutter  objects  include  woods,  stones,  a  plastic  spray  paint  cap,  and  a  fdled  hole. 


Eandmines/Eandmine  Simulants 

Objects 

AT  Eandmines 

AP  Landmines 

Landmine 

Simulants 

Non-Metal 

Objects 

Metal  Objects 

17 

10 

2 

17 

16 

29 

33 

Table  1:  Objects  buried  in  the  calibration  grid.  The  other  grid  squares  contained  nothing. 

3.  Hidden  Markov  Models 


3. 1.  HMM  Description 

Hidden  Markov  models  are  stochastic  models  for  stochastic  processes  that  produce  time  sequences  of 
random  observations  as  a  function  of  states.  Transitions  among  the  states  are  governed  by  a  set  of 
probabilities  called  transition  probabilities.  In  a  particular  state,  an  output  or  observation  can  be  generated 
according  to  the  associated  probability  distribution.  It  is  only  the  output,  not  the  state,  that  is  visible  to  an 
external  observer.  Therefore  states  are  hidden  or  not  observable  to  the  outside.  Although  HMMs  are 
described  elsewhere,  e.g.  [1-3],  we  provide  some  description  here  for  completeness. 

The  input  to  the  HMM  is  the  observation  sequence,  which  is  a  sequence  of  feature  vectors 
O  =  o^o^-'-Oj .  The  number  of  states  of  the  model  is  given  by  N,  the  individual  states  by 
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5*  =  ,  5*2 ,  •  •  • ,  5*^  }  ,  and  the  current  state  hyq,.  For  a  discrete  HMM,  the  number  of  distinct  observation 

symbols  is  M,  and  the  individual  symbols  are  given  by  V  =  {vj ,  Vj ,  •  •  • ,  }  .  Every  HMM  has  a  set  of  state 

transition  probabilities  A  =  {a^  }  given  by 

=Sj\q,=S.},  1  <  i,j  <  N  (1) 

Transition  probabilities  should  satisfy  the  normal  stochastic  constraints, 

a,j  >  Oand  =  1,  1  <  i,j  <  N  (2) 

A  discrete  HMM  is  has  a  set  of  discrete,  conditional  probability  density  functions,  one  for  each 
state, which  can  be  used  to  form  the  matrix  ,  B  =  {b-  {k)}  where 

b.{k)  =  p{o,  =v^\q^=S  .),\<  j  <N  ,\<k<M  (3) 

is  the  observation  sequence.  The  following  stochastic  constraints  must  be  satisfied 

M 

bj  (k)  >  0  and  (A:)  =  1 ,  1  <  y  <  A ,  1  <  A:  <  M  .  (4) 

k=l 

A  continuous  HMM  uses  continuous  probability  density  functions.  In  this  case  we  specify  the 
parameters  of  the  probability  density  function.  Usually  the  probability  density  is  approximated  by  a 
weighted  sum  of  M  Gaussian  distributions  N , 

M 

(5) 

m=l 

Where  the  c  are  the  mixture  coefficients,  /u  .^  the  mean  vectors,  and  thecovariance  matrices. 
The  coefficients  c  must  satisfy  the  stochastic  constraints 

M 

Cjm  ^0  and  =  1,1  <  y  <  A ,  1  <  A:  <  M  .  (6) 

m=l 

The  initial  state  distributions  are  given  by  ;r  =  {n^} ,  where ;r,  =  p{q^  =i}  ,  1  <  /  <  A . 

Therefore  we  can  use  the  compact  notation  X  =  (A,B,7r)  to  denote  an  HMM  with  discrete  probability 
distributions,  and  X  =  {A,c j^,n)  to  denote  one  with  continuous  densities. 
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The  input  to  the  HMM  is  the  observation  sequenee,  which  is  a  sequence  of  feature  vectors 
O  =  OjOj  ■•■Oj  .  The  output  of  an  HMM  is  computed  using  the  Viterbi  algorithm  to  find  the  optimal  state 

sequence  and  produces  the  quantity  \og{p{p,q*  \  X  ))  where  q*  represents  the  optimal  state  sequence  [1]. 

This  quantity  can  be  thought  of  as  representing  the  probability  of  the  observation  sequence  given  the 
model. 

In  the  landmine  detection  problem,  observation  vectors  are  generated  from  GPR  measurements  as 
completely  described  in  [2].  The  states  are  associated  with  varying  geometry  between  the  GPR  antennas 
and  the  object,  as  depicted  in  Figure  2.  Initially  the  system  may  be  in  a  background  state,  then  changes  to 
a  state  in  which  the  sensor  receives  returns  from  the  mines  but  is  not  over  the  mine,  the  one  in  which  it  is 
over  the  mine,  then  moving  away  from  the  mine  but  still  receiving  returns  from  the  mine,  and  finally  in  a 
background  state  again. 

BACKGROUND 
MINE  STATE  1 
MINE  STATE  2  O' 

MINE  STATE  3  C2>  CD  CD 

BACKGROUND  CD  CD  CD 


Figure  2.  Depiction  of  a  mine  FIMM for  landmine  detection. 

There  is  one  observation  vector  per  channel  per  measurement.  Sequences  of  15  observation  vectors, 
corresponding  to  approximately  0.75  meter  are  used  as  inputs  to  the  HMM.  For  a  given  experiment,  there 
are  two  models;  a  mine  model  and  a  background  model.  The  output  of  the  HMM  landmine  detection 
system  is  a  confidence  value  given  by 


O  O 


C  =  log(R(o,  q*  I  ))-  log(R(o,  q*  \  X,^,kground  ))  =  log 


'p[0,q\X^A 


P{p ,  q  I  background  ) 


(7) 
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3.2. 


Training 


Training  an  HMM  is  the  process  of  estimating  the  parameters  of  the  HMM  using  a  training  set.  The 
standard  method  of  training  HMMs  is  the  Baum-Welch  method,  which  is  well-described  in  the  literature. 
In  addition  to  Baum-Welch,  it  has  been  shown  that  Minimum  Classification  Error  (MCE)  training,  also 
referred  to  as  discriminative  training,  improves  performance  of  HMMs  for  landmine  detection  [3].  These 
methods  are  described  in  the  literature  but  are  included  here  for  completeness.  In  addition,  the  MCE 
training  for  landmine  detection  discussed  in  the  literature  is  for  discrete  HMMs  only.  The  results  shown 
in  this  document  are  the  first  reported  results  using  MCE  training  with  continuous  HMMs  for  landmine 
detection. 


3.2.1.  Baum-Welch  Training 

To  solve  the  learning  problem,  Baum  and  his  colleagues  defined  an  auxiliary  function  [4]; 

q(a,J)=  I  0,A)log(p(o,q\  I))  (g) 

<1 

where  X  is  the  auxiliary  variable  corresponding  to  X .  They  also  proved  if  the  value  of  q{x,  x) 
increases,  then  the  value  of  p{o  \  x)  also  increases,  i.e. 

q(x,x)>  Q(X, X)^p{0Ix)>p(0IX)  (9) 


Thus,  the  problem  of  maximising  p(0  |  X)  is  replaced  by  maximizing  Q[X,Xj  with  respect  to  X  .  Two 
auxiliary  variables  are  defined  for  use  in  Baum-Welch  training.  The  first  variable  is  defined  as  the 
probability  of  the  observation  being  in  state  S.  and  the  (t+lf'  observation  being  in  state  S  j .  Eormally, 


(h  j)  =  p(q,  =  Si ,  q,^i  =Sj\0,X) 


(10) 


..  p[(l,=Si,qi^i=S  0\X) 

- Xm) - 

Using  forward  and  backward  variables  this  can  be  expressed  as. 


(11) 
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(12) 


(})aijPtAj)bj{o  f  +  1  ) 

N  N 

IZ  oCt(})aijPtAj)bj{o  f+1  ) 


/=1  j=\ 

The  second  variable  is  a  posteriori  probability,  which  is  the  probability  of  the  observation  being  in 
state  S. : 


r,ii)  =  p{qt  =Si  \  0,X) 


(13) 


Using  forward  and  backward  variables  we  can  rewrite  equation  (1 1)  as, 


r,(b)  =  - 


(14) 


(=1 

One  can  see  that  the  relationship  between  /^(i)  and  given  by, 

\<i<N  ,\<t<M  (15) 

j=i 

In  the  Baum-Welch  learning  process,  the  parameters  of  a  discrete  HMM  are  updated  in  such  a  way  to 
maximize  p{0\  Z) .  Assuming  a  starting  model  Z  =  (A,B,n:),  one  calculates  the  following  so-called  re¬ 
estimation  formulas 


l<z<A, 


(16) 


- ’  \<i,j<N,  (17) 

t=l 

tr.iJ) 

b.{k)=‘-=^ - ,  \<j<N,\<k<M  (18) 

Y^rXJ) 

t=l 

Reestimation  formulas  have  been  derived  for  to  the  continuous  density  case  [5][6]. 


3.2.2.  Minimum  Classification,  or  Discriminative,  Training 

Minimum  Classification  Error  (MCE)  with  Generalized  Probabilistic  Descent  was  first  proposed  by 

Juang  and  Katagiri  [7]  based  on  an  earlier  approach  by  Amari[8].  It  has  been  widely  applied  to  several 
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classifier  structures,  such  as  Multi-Layer  Perceptrons  [9],  and  Hidden  Markov  Models  [7].  The  essential 
aspect  of  this  approach  is  to  train  the  classifier  structure  so  as  to  minimize  the  classification  error  rate 
using  a  gradient-based  method  together.  However  the  error  rate  involves  a  discontinuous  classification 
loss,  since  the  classification  is  either  correct  or  incorrect.  This  makes  it  difficult  to  apply  gradient-based 
optimization  techniques,  which  requires  that  the  objective  loss  function  is  at  least  first-order 
differentiable.  The  strategy  of  MCE  is  to  smooth  the  discontinuous  classification  loss  function,  while  still 
staying  close  to  the  loss  function  and  use  a  gradient-based  adaptation  method. 

Consider  a  set  of  observations  L  =  {xy,X2,x^,--- ,Xfj] ,  where  x,  is  from  one  of  theM  classes  C ^ ,  7=1, 
2,  •  •  •  ,M  ,  a  classifier  parameter  set  A ,  M  discriminant  functions  gj  (x;  A)  ,  and  the  decision  rule; 

C(x)  =  C (/)  ,  if  g .  (x;  a)  =  max  gj  (x;  A)  (19) 

A  general  misclassification  measure  for  the  class  sample  can  be  defined  as  [7]: 

1 

di{x)  =  -gXx-,IS)+ 

IVi  1 

where  7  is  a  positive  number,  controls  the  contribution  of  each  misclassification  towards  the  error  metric. 
Note  that  when  77  is  large,  the  most  confusable  class  contributes  the  most  to  the  summation  component. 
The  cost  function  can  be  defined  as  [7]; 

f,(x;A)  =  f,(t/,(x)),  (21) 

There  are  many  choices  for  the  loss  I ^{d^),  for  instance  sigmoid  function; 

(A<)  =  -d^.r>^  (22) 

i  +  e 
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Loss  function  with  theta  =  0 


Figure  3:  Sigmoid  Loss  Function 

One  can  see  that  this  loss  function  approximates  an  ideal  binary  loss  function  well  and  is  continuous, 
which  is  suitable  for  gradient  algorithms.  A  negative  (x)  indicates  correct  classification,  and  no  loss  is 

incurred;  a  positive  ^  (x)  leads  to  a  loss,  which  can  be  used  to  count  classification  error.  Thus  for  a  given 
sample  x,  the  overall  expected  loss  T(a)  is  [7]: 

L{^)  =  Y,P^^k)\h{x,A)p{x\C,)dx  ,  (23) 

k 

Where  p{C^. )  and  p{x  \  Q )  are  the  class  a  priori  and  conditional  probabilities  respectively. 

Generalized  Probabilistic  Descent  (GPD)  is  used  to  train  HMMs  to  minimize  the  expected  classification 
error,  that  is,  to  minimize  the  overall  expectation  of  theloss  T(a)  .  Since  the  distributions  are  unknown, 
the  expected  loss  is  not  known.  A  solution  to  this  difficulty  is  given  by  Amari’s  Probabilistic  Descent 
Theorem  [8]  which  shows  that  for  an  infinite  sequence  of  random  samples  {x^}"i  and  step  size  sequence 
s,  that  satisfies  the  conditions 


00 


?=1 

(24) 

and 

00 

t=\ 

(25) 

adapting  the  system  parameters  according  to 

(26) 
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where  denotes  the  parameters  of  A  at  t,  eonverges  with  a  probability  of  one  to  a  loeal  minimum  of 
Z(a)  .  f/  is  a  positive  definite  matrix  whieh  allows  one  to  seale  the  learning  rate  differently  for  different 
model  parameters.  This  is  important  when  models  are  more  sensitive  to  some  parameters  than  others,  as  is 
the  ease  for  eontinuous  HMMs.  Continuous  HMMS  are  mueh  more  sensitive  to  the  eovarianee 
parameters  than  the  means  and  therefore  different  learning  rates  should  be  used  .  The  simplest  example 
for  U  is  the  identity  matrix. 


Equation  26  is  used  to  update  the  parameters  of  the  HMMs  for  diseriminative  training.  Speoifieally,  the 
diseriminant  funetion  are 

(O.  )  =  log  (Oo )  +  ^  log  a"  +  Y,  log  b^M  (o^„ ) 

for  the  mine  model  (the  baekground  model  is  similar).  In  this  expression,  {q^  ,q^  ,qf  }  is  the  optimal 
state  sequenee  found  by  the  Viterbi  algorithm. 


If  O  is  a  miselassified  baekground  sequenee,  the  miselassification  measure  is: 

de  (O)  =  gM  (0, 4  )  -  (0. 4  ) 


(27) 


and  the  related  loss  funetion  is: 

If  O  is  a  miselassified  mine  sequenee,  the  miselassifieation  measure  is: 

4(0)  =  gB(0.4)-gM(0.4). 


(28) 


(29) 


and  the  related  loss  funetion  is: 

(4  (0))~  • 

The  overall  loss  funetion  is: 


(30) 
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£(0)= 


£b  if  O  e  4 


(31) 


In  order  to  maintain  the  original  HMM  eonstraints,  sueh  as: 


=  1  V/=l,  N,  and  =  1 , 

j=\  i=i 


(32) 


the  following  transformations  are  introdueed: 


^ij  A 


E 

j=i 


a„ 


E-, 

y=i 


(33) 


The  mine  and  baekground  models  are  then  updated  aceording  to  equation  (26) 


4.  Experimental  Results 

4.1.  Experiments 

The  EFGPR  made  four  passes  over  the  ealibration  grid  with  different  gain  settings.  We  used  two 
strategies  for  training  and  testing.  The  first  strategy  involved  training  on  one  pass  over  the  ealibration 
grid  and  testing  on  another.  The  seeond  strategy  involved  a  modified  leave-one-out  method,  in  whieh  a 
given  pass  was  iteratively  divided  into  training  and  testing  sets  by  leaving  one  sample  out  of  the  training 
set  on  eaeh  iteration.  Eaeh  sample  is  left  out  onee  and  only  onee.  The  feature  sequenees  from  eaeh  grid 
square  were  treat  as  one  sample.  Thus  there  are  125  samples  in  one  data  set.  For  eaeh  iteration,  the  initial 
models  were  trained  on  the  training  subset.  The  trained  models  were  tested  on  the  testing  subset.  The 
trained  models  serve  as  the  initial  models  for  the  next  run.  The  proeedure  would  run  for  125  times,  until 
eaeh  sample  was  tested  onee  and  only  onee.  Finally,  these  models  were  tested  on  the  other  3  passes. 

The  first  strategy  has  the  advantage  of  different  looks  at  the  ealibration  lane  and  the  disadvantage  that 
different  hardware  settings  were  used  to  eolleet  the  data.  The  seeond  strategy  has  the  advantage  of  using 
the  same  hardware  settings  and  that  each  grid  square  was  used  as  a  test  sample  exactly  once  and  was 
independent  of  the  training  data.  The  first  strategy  will  be  referred  to  as  the  train  one/test  three  strategy 
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and  the  second  as  the  leave-one-out  strategy.  Discrete  and  continuous  HMMs  were  investigated  as  was 
the  fusion  of  discrete  and  continuous  HMMs. 


As  noted  above,  the  EFGPR  collects  a  three-dimensional  array  of  data  corresponding  to  a  physical  region 
approximately  one  meter  wide  and  the  length  of  an  entire  lane.  The  JUXOCO  grid  was  designed  for 
sensors  to  collect  data  only  at  the  center  of  the  grid.  Thus,  the  HMMs  were  run  only  on  GPR  data 
collected  only  near  the  center  of  each  grid  square.  More  specifically,  the  HMMs  were  run  on  channel 
thirteen  of  the  twenty  five  channels  and  at  the  three  down-track  positions  determined  to  be  closest  to  the 
center  of  the  grid  square.  Thus,  three  confidence  values  were  produced  for  each  grid  square.  The  average 
of  the  three  values  was  used  as  the  confidence  that  a  mine  was  present  in  a  given  grid  square. 

The  mine  and  the  background  models  were  applied  at  each  observation  vector  O-  ,  i=  1,  2,  3,  where 

k  denotes  the  k'^grid  square  in  the  lane.  Each  model  produced  an  output  value  for  each  observation 
vector.  A  confidence  value  was  produced  based  on  the  difference  of  the  output  values  for  each 
observation  vector: 

C‘  =  log(p(o‘,?  I  A„„J)-log(p(o*,,  I  (34) 

Where  /=  1,2,3,  k=l,2,  ...,25. 

The  average  of  the  three  confidence  values  was  associated  with  the  grid  square. 

C^  =  avgCf,k=l,2,  ...,25.  (35) 

Further,  the  values  of  C*were  thresholded  at  various  values  to  make  the  final  decision. 


D'  =1,  if  C'  >t, 
=  0,  else. 


(36) 


Algorithm  evaluation  is  carried  out  using  ROC  curves.  A  mine  is  detected,  if  is  1  and  there  is  a  mine 
or  mine  simulant  present  in  the  grid  square.  A  mine  is  missed,  if  D*  is  0  and  there  is  a  mine  or  mine 
simulant  in  the  grid  square.  A  false  alarm  is  generated,  if  D*  is  1  and  there  is  no  mine  or  mine  simulant  in 
the  grid  square.  A  background  is  detected,  if  O’"  is  0  and  there  is  no  mine  or  mine  simulant  in  the  grid 
square.  The  probability  of  detection  (PD)  and  the  probability  of  false  alarm  (PEA)  are  computed  for  each 
value  of  the  threshold,  resulting  in  a  ROC  curve.  PD  is  defined  as  the  number  of  mines  detected  divided 
by  the  total  number  of  mines  and  mine  simulants  in  the  grid,  while  PEA  as  the  number  of  false  alarms 
divided  by  the  total  number  of  grid  squares  without  mine. 
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4.2. 


Results 


4.2. 1.  Continuous  HMM  Results 

The  train  one/test  three  strategy  was  used  to  perform  experiments  with  continuous  HMMs. 
Continuous  HMM  models  used  in  previous  AT  mine  detection  projects  were  used  as  the  initial  models. 
These  models  were  trained  on  one  pass  of  data  and  evaluated  on  the  other  three  passes.  A  total  of  87  mine 
observation  vectors  and  288  background/clutter  observations  were  used  for  training.  The  number  of 
Gaussian  mixture  components  in  the  models  were  varied.  Continuous  HMM  results  are  shown  in  Figures 
4-6.  Since  the  models  were  tested  on  three  different  passes,  the  standard  deviation  in  the  Pd  for  each  Pfa 
is  shown  also. 
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4.2.2. 


DHMM  Results 


Two  experiments  were  eondueted  with  DHMM  models.  The  first  experiment  used  the  train  one/test 
three  strategy.  The  seeond  used  the  leave-one-out  strategy.  The  DHMM  models  were  initialized  using 
the  SOFM  strategy  deseribed  in  [2],  and  then  trained  on  one  data  set,  further  tested  on  other  three  data 
sets.  We  did  this  experiment  with  different  codebook  sizes.  The  results  are  shown  in  the  figures  7-9. 
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Figure  8:  Results  of  Discrete  FIMM  with  50  Symbols  in  the  Codebook  using  the  train  one/test  three  strategy. 


Figure  9:  Results  of  Discrete  FIMM  with  100  Symbols  in  the  Codebook  using  the  train  one/test  three  strategy. 


Discrete  models  were  also  trained  with  the  leave-one-out  strategy.  The  results  are  shown  below. 
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DHMM . Std  Deviation 


Figure!  0:  Results  of  Discrete  HMM  with  25  Symbols  in  the  Codebook  using  the  leave-one-out  strategy. 
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4.2.3. 


Fusion  Results 


We  did  a  simple  fusion  to  the  outputs  of  CHMMs  and  DHMMs.  The  fusion  was  performed  by 
averaging  of  the  confidence  values  from  each  model  for  each  grid  square.  The  CHMM’s  results  are  shown 
in  Figure  8.  The  DHMM’s  model  is  picked  from  the  DHMM  experiment  associated  with  Figure  10.  The 
fusion  results  are  illustrated  in  Figure  13. 


Continuous . Discrete  — h — Continuous  +  Discrete 


Figure  13:  Results  of  Fusing  the  Continuous  and  the  Discrete  FIMM  Compared  to  Those  of  the  Continuous  and  the  Discrete 

HMM  respectively 

The  fusion  result  performs  better  than  that  of  each  individual  model.  For  example,  at  100%  and  90% 
PD,  the  PFA  dropped  by  about  %  compared  to  the  continuous  model.  And  the  continuous  model 
performed  a  little  better  than  the  discrete  model.  These  results  can  be  compared  to  methods  reported  on  in 
[10]  generated  by  a  hand-held  GPR  on  the  JUXOCO  grid.  The  HMM  with  the  EFGPR  performs  better 
than  the  baseline  hand-held  method  that  uses  pointwise  processing  but  not  as  well  as  the  spatial 
processing  method.  However,  there  are  several  differences  in  experimental  design  that  make  the 


20 


comparison  less  than  conclusive.  For  example,  in  [10]  the  positioning  was  much  more  precise,  which  is  a 
significant  issue  in  detection  and  discrimination  of  AP  mines. 


5.  Summary 


Continuous  and  discrete  HMM  algorithms  were  applied  to  GPR  data  acquired  with  the  EFGPR  system  on 
the  JUXOCO  calibration  grid.  The  algorithms  were  trained  using  Baum-Welch  and  MCE  training.  Two 
training  strategies,  the  train  one/test  three  and  leave-one-out  strategies  were  employed.  Discriminative 
training  demonstrates  improved  performance  over  Baum-Welch  training.  The  best  results  were  obtained 
with  a  fusion  of  continuous  and  discrete  HMMs.  The  system  achieved  Probabilities  of  Detection  of  100% 
and  90%  with  a  Probabilities  of  Ealse  Alarm  of  about  40%  and  25%  respectively.  These  rates  compare 
favorably  with  some  published  rates  in  the  literature  but  not  as  well  as  the  best.  We  conclude  that  HMMs 
with  the  existing  feature  sets  perform  well  for  detection  of  AT  mines  but  discrimination  between  mines 
and  discrete  clutter  objects  requires  better  feature  sets. 
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